Computational Biology and Chemistry
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Computational Biology and Chemistry's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Sindhi, N. A.; Pawar, N.; Dixson, J.; Garcia, D.
Show abstract
Predicting protein-protein interactions is a fundamental problem in molecular biology. Experimental approaches for identifying protein-protein interactions are time-consuming and labor-intensive, motivating the development of efficient computational alternatives, including machine learning-based methods. However, conventional machine learning methods often rely on manually engineered features that require substantial domain expertise. In this study, we propose a two-stage framework to address these limitations. In the first stage, a one-dimensional convolutional neural network autoencoder is used to automatically learn latent representations from protein sequences. The quality of these features is evaluated through reconstruction error, reflecting how accurately the model reconstructs the original sequence. In the second stage, these learned features are combined with amino acid frequency-based features to form a hybrid feature set for predicting protein-protein interactions. A systematic comparison is performed between models trained on frequency features alone and those using a hybrid representation. The comparison showed that incorporating one-dimensional convolutional neural network-derived latent features improved the models performance of predicting protein-protein interactions. The dataset was split into training, validation, and test sets. Nested cross-validation was employed, with inner loops for hyperparameter tuning and outer loops for model selection. The random forest classifier achieved the best performance, with a mean receiver operating characteristic-area under curve of 0.91 and a test F1-score of 0.87. These results highlight the effectiveness of integrating deep feature learning with ensemble methods for predicting protein-protein interactions and build upon previous work focused on this fundamental problem. Author SummaryProtein-protein interactions are fundamental in all biological processes. However, predicting these interactions is a key problem in molecular biology. Computational approaches have been tested to address this problem. We applied a mix of machine learning and deep learning to gain insight into the qualities of proteins that engage in interaction. First, we trained a deep learning model, which automatically learned the primary sequence and characters related thereto, reducing bias in the actual prediction process. We combined these features, or latent representations, with amino acid frequency features of protein sequences, and called the two together "hybrid features." Then we performed a systematic comparison of amino acid frequency features-only with hybrid features, among four different machine learning classifiers. Our results suggest that the random forest classifier performed best among all four classifiers at predicting interactions between proteins. We propose that this approach could be used to improve efficiency in testing protein-protein interactions at the bench and may have applications to other biologically relevant molecular interactions.
Mostert, B.; Judd, R.; Makris, T.; Xie, D.
Show abstract
Artemisinin is an effective antimalarial drug sourced from Artemisia annua, but its low and variable yields require enhancement either semi-synthetically or in-planta to meet the global demand for treatment. Though essential enzymes have been identified in the artemisinin biosynthetic pathway, including an essential Cytochrome P450 monooxygenase (CYP71AV1), there are still many unknowns. Cytochrome P450 reductase 1 (herein, AaCPR1), has been experimentally confirmed as an electron transfer partner for CYP71AV1 in its three step oxygenation of key artemisinin precursors. However, the recent discovery of a highly related CPR, herein AaCPR2, introduces the possibility that another, potentially more catalytically favourable interaction, could exist for CYP71AV1. Therefore, enzyme kinetics and differential scanning fluorimetry (DSF) were used in the characterisation of both AaCPR1 and AaCPR2 to determine the existence and source of their catalytic differences. Tested enzyme activity under cytochrome c and NADPH concentrations revealed that AaCPR1 had lower Km and higher kcat/Km values, while AaCPR2 had higher Vmax and kcat values. This suggests that AaCPR1 is more effective at reducing cytochrome c when substrate conditions are limiting, whereas AaCPR2 is more effective than AaCPR1 at reducing cytochrome c when substrate conditions are saturating. This implies a functional partitioning of the two enzymes on the basis of substrate availability. The DSF results provided deeper insight into the different protein-ligand interactions between the two enzymes. AaCPR2 reached lower maximum melting temperatures across all tested conditions, whereas AaCPR1 had higher maximum melting temperatures. Thus, AaCPR1 exhibits higher thermal stability and has a higher temperature threshold than AaCPR2. This contributes to the notion that the AaCPRs are functionally divergent also on the basis of temperature. The cumulative differences in melting behaviour between the two enzymes led to the hypothesis that AaCPR1 and AaCPR2 exhibit different domain motions that may lead to preferential catalysis for one redox partner over another. This was further supported by the prediction of a highly variable loop region between the two enzymes at the connecting domain just after the flexible hinge. If such loops are highly mobile, as predicted, then the residue differences therein could provide a bio-structural basis for the kinetic and thermal/biophysical differences observed between AaCPR1 and AaCPR2. These data support that AaCPR1 and AaCPR2 possess fundamental biophysical differences despite their high degree of relatedness. Ultimately, these differences suggest differential metabolic functions of the two enzyme in artemisinin biosynthesis and/or other important secondary metabolic processes.
Andueza, M.; Villoslada-Blanco, P.; De Dreuille, B.; Alonso, L.; Sabroso-Lasa, S.; Pantel, K.; Alix-Panabieres, C.; Lopez de Maturana, E.; Malats, N.
Show abstract
Cancer is a major global health issue with rising incidence and mortality. Early detection, tumor characterization, and disease surveillance are crucial for timely and effective treatment, ultimately reducing mortality rates. Liquid biopsy (LB) has emerged as a valuable detection tool offering a non-invasive method to determine tumor-derived biomarkers in body fluids with demonstrated translational potential. To increase biomarker sensitivity, high-throughput sequencing platforms deliver massive volumes of data. Artificial Intelligence (AI) is pivotal in enabling huge and complex data integration. This contribution aims to assess the current state of integrative AI-based research in the LB field and provide methodological guidance. First, we conducted a PubMed search and found that the literature is sparse in studies integrating LB features, particularly by applying AI. When adopting the latter approach, defining the study objectives is crucial to guide the subsequent methodological aspects, including study design, patient selection criteria, sample size, nature of the LB features, and metadata to collect. Specifically, we propose strategies and tools for data preprocessing, including normalization and batch correction, as well as handling outliers and missing data. Furthermore, we recommend various Machine/Deep Learning approaches for feature selection techniques to ensure model robustness, and we highlight the importance of undergoing rigorous internal and external validations of the selected models. Assessing clinical utility and interpretability is often overlooked but fundamental for real-world implementation. In conclusion, we provide the LB scientific community with an AI-based methodological guidance to bridge the two fields and enhance the integrative analysis of LB features. Graphical abstractWorkchart for multiomics integrative studies in the liquid biopsy field. Note: CTCs, circulating tumor cells; ctDNA, circulating tumor-DNA; TEPs, tumor-educated platelets; miRNA, microRNA; cfRNAs, cell-free RNAs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=159 SRC="FIGDIR/small/724535v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@1f250b2org.highwire.dtl.DTLVardef@18fe36corg.highwire.dtl.DTLVardef@19c02b9org.highwire.dtl.DTLVardef@176f6e0_HPS_FORMAT_FIGEXP M_FIG C_FIG
Siddiqi, M. A.; Kumar, H.; Mazumder, M.
Show abstract
Influenza A virus (IAV) causes significant morbidity and mortality worldwide. Understanding how viral RNAs may regulate host genes through microRNA-like mechanisms can clarify pathogenesis and reveal therapeutic targets. In this study, we screened all eight IAV H3N2 RNA segments (PB2, PB1, PA, HA, NP, NA, M, and NS) using an ab initio computational pipeline; five segments (PB2, PB1, PA, HA, and M) met the VMir scoring threshold for further analysis, while NP, NA, and NS were excluded due to low pre-miRNA scores. Mature miRNAs were identified using MatureBayes, and target genes in the human genome were predicted with the miRDB server. From these targets, we selected two genes per qualifying segment (10 genes total) based on their functional relevance to influenza infection and supporting literature; all selected genes are unique to their respective segment. We identified 10 segment-specific target genes (IFNL1, DDX60, SAMHD1, MAVS, IRF4, BIRC2, AGO1, MAP3K1, NOD1, and TNFAIP1) and one common target across all five analyzed segments (CADM2). Gene Ontology and pathway analyses showed enrichment in interferon signaling, RIG-I-like receptor pathways, antiviral restriction, RNA interference, and inflammatory responses. Literature supports roles for these genes in pulmonary and antiviral innate immunity. Our findings provide a basis for experimental validation and may help the research community better understand influenza virus pathogenesis and identify novel therapeutic candidates. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/725090v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@2b14adorg.highwire.dtl.DTLVardef@5a9b2eorg.highwire.dtl.DTLVardef@81ffc1org.highwire.dtl.DTLVardef@be119b_HPS_FORMAT_FIGEXP M_FIG C_FIG
Shivakumar, A.; Hunt, A. G.; Chakrabarti, M.
Show abstract
Hemp (Cannabis sativa) produces a wide array of medicinally significant compounds, including cannabidiol (CBD). These compounds are predominantly synthesized in female hemp inflorescences. The proposed research utilizes next-generation sequencing-based transcriptome analysis using a 3{square}-end-directed approach to identify differentially expressed genes between male and female hemp plants at the early vegetative stage. 886 differentially expressed genes (DEGs) were identified, a majority of which were upregulated in males compared to females. We hypothesized that alternative RNA processing contributes to sex-specific gene expression. To this end, 932 genes were identified that exhibited significant changes in poly(A) site usage when comparing males and females. These genes were much more likely to be differentially expressed, supportive of this hypothesis. Males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. This suggests that polyadenylation remodels hemp mRNAs with distal poly(A) sites being preferred in males. To further investigate when this sex-specific gene expression program is established, RNA was isolated from plants at various developmental stages, such as developing seeds, four-day-old seedlings, and different developmental stages up to four weeks after sowing. Diagnostic male-specific genes were analyzed using RT/PCR. The results indicate that sex-specific gene expression is not evident in seeds but rather is set during or after germination. SignificanceO_LIHemp males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. C_LIO_LIThe sex-specific gene expression program is not yet established in mature seed but is set in the time between germination and 4 days of growth. C_LI
Gao, H.; Shen, J.; Chen, D.; Mol, B. W.; Hun, W.; Liang, Z.; Bai, X.; Han, X.; Zhu, J.; Wang, H.; Liu, X.; Su, C.; Weng, R.; Liu, Y.; Li, W.; Zhang, D.
Show abstract
Abstract Introduction The ARRIVE trial first demonstrated that elective induction of labour (IOL) at 39 weeks in low-risk pregnancies reduced the likelihood of caesarean section (CS) without compromising perinatal safety; however, the generalizability of these findings remains debated, leading to uncertainty in clinical practice. The LIRIC trial aims to evaluate whether 39-week elective IOL reduces CS rates compared with expectant management, while exploring its impact on infant neurodevelopment and multi-omics profiles. Methods and analysis This is a single-centre, open-label, randomized controlled trial in China. A total of 1,074 low-risk pregnant women (nulliparous or multiparous) will be randomly assigned (1:1 ratio) to either 39-week IOL or expectant management. The primary outcome is the caesarean section (CS) rate. Secondary outcomes include a composite of severe neonatal morbidity and perinatal mortality and infant neurodevelopmental scores (Bayley-4 and ASQ-3), among others. Data analysis will follow the Intention-to-Treat (ITT) principle. Biospecimen will be collected for metagenomic and metabolomic analyses, with results to be reported separately. Ethics and dissemination The protocol has been approved by the Ethics Committee of Women's Hospital, School of Medicine, Zhejiang University. Informed consent will be obtained from all participants. Results will be disseminated via peer-reviewed journals, and standardized infant developmental reports will be provided to participants to enhance study benefit. Trial registration number NCT07082530.
Jaber, N.; Di Somma, A.; Rodriguez-alfonso, A. A.; Cane, C.; Read, C.; Ständker, L.; Wiese, S.; Duilio, A.; Münch, J.; Spellerberg, B.
Show abstract
BackgroundRising antimicrobial resistance rates, require new therapeutic approaches such as antimicrobial peptides (AMPs), which are part of the innate immune defense, as alternatives to antibiotics. In this study, we aim to unravel the antibacterial activity of human histone H1.2 peptide against Pseudomonas aeruginosa and its potential immune modulatory role. MethodsWe used a hemofiltrate peptide database for antimicrobial peptide prediction to identify novel human AMPs. Thirteen sequences of histone H1 were identified as putative AMPs, synthesized, and tested against bacterial ESKAPE pathogens in a radial diffusion assay. SYTOX green assay, electrophoretic mobility shift assay, and differential proteomics assays were conducted to determine the mode of action of H1.2 peptide fragment. A crystal violet assay was performed to evaluate the inhibition of biofilm formation. The cytotoxicity of the peptide was tested in LDH and Alamar assays. Finally, to visualize the contributions of H1.2 in NETs formation, scanning electron microscopy was performed. ResultsThe H1.2 peptide inhibited the growth of P. aeruginosa in a dose and pH-dependent manner without cytotoxicity towards mammalian THP-1 cells. It acts on intracellular targets to inhibit the growth of P. aeruginosa. STRING analysis from the differential proteomics assay showed that H1.2 targets the downregulation of proteins involved in the biogenesis of outer membrane proteins, including the folding and trafficking of outer membrane proteins across the cytoplasmic membrane. Scanning electron microscopy images showed that H1.2 forms NET-like structures capable of trapping and immobilizing P. aeruginosa. ConclusionThe characterized antimicrobial activity of H1.2 points to a role for human histone H1 fragments in innate immunity and may represent a promising approach for the development of novel antibacterial therapies. Graphical Summary O_FIG O_LINKSMALLFIG WIDTH=192 HEIGHT=200 SRC="FIGDIR/small/724237v1_ufig1.gif" ALT="Figure 1"> View larger version (36K): org.highwire.dtl.DTLVardef@1778ddborg.highwire.dtl.DTLVardef@26430org.highwire.dtl.DTLVardef@ffbfa2org.highwire.dtl.DTLVardef@7e38ae_HPS_FORMAT_FIGEXP M_FIG C_FIG Sec transport and BAM complex system including chaperone proteins and quality control proteases are inhibited by H1.2 in Pseudomonas aeruginosa.Outer membrane proteins (OMPs) are synthesized in the cytoplasm and transported across the inner membrane via the Sec translocase, assisted by SecA/SecB or ribosomes. In the periplasm, they are escorted by chaperones such as SurA to the BAM complex for insertion into the outer membrane. Here, we show that H1.2, an antimicrobial peptide, targets membrane biogenesis in P. aeruginosa through downregulating Sec translocase (SecA/SecB and SecYEG), SurA, and BAM complex. Therefore, leading to improper transfer, folding and insertion of OMPs into the outer membrane. Normally, misfolded proteins are degraded by the protease MucD to prevent toxic aggregation in the bacteria. However, with H1.2 inhibiting MucD the proteotoxic stress is exacerbated, ultimately compromising bacterial homeostasis and viability. Figure created using BioRender.com.
Faleel, D.; Arnest, R.; Aradhyula, V.; Boyapalli, S.; Haller, S. T.; Kennedy, D. J.
Show abstract
The Na+/K+-ATPase (NKA) regulates ion balance in the kidney and influences cellular processes like proliferation and apoptosis through its signal transduction. The endogenous ligand 20-Hydroxyeicosatetraenoic acid (20-HETE) contributes to inflammation and fibrosis in chronic kidney disease (CKD) and inhibits NKA activity in renal tubules. However, the molecular mechanism of this interaction remains unclear. In this study, we used in-silico approach to investigate the potential interaction between 20-HETE and NKA. Various ligands, including known NKA ligands such as cardiotonic steroids (CTS), 20-HETE, and negative controls, were docked using rigid and Induced Fit Docking to predict the affinity of the ligands toward NKA. Binding free energy calculations with the Prime Molecular mechanics with generalized Born and surface area (Prime MM/GBSA) tools were used to confirm the involvement of key amino acids in ligand-receptor interactions. The docking analyses revealed that 20-HETE exhibited a binding affinity comparable to negative control, with some differences between rigid and induced fit docking. Binding free energy data highlighted key amino acids in the 20-HETE and NKA interaction. Interaction fingerprint and mutations such as Ala330Gly and Val329Ala significantly reduced binding free energy, while Thr804Ala showed a notable decrease, underscoring the potential importance of these amino acids in ligand stabilization. These findings provide computational evidence supporting potential direct interaction between 20-HETE and NKA and identify candidate residues for future experimental validation.
Ye, X.; Zhou, S.; Chen, X.; Hu, C.; Hu, H.; Ding, J.; Teng, W.
Show abstract
Colorectal cancer (CRC) poses a severe global health threat with high incidence, mortality, and poor 5-year survival rates for advanced cases despite existing treatments. This study aims to explore the role of STRIP2 in CRC progression and its underlying mechanisms. Impact of STRIP2 on CRC in vitro was investigated via CRC cell proliferation, migration, invasion, and apoptosis. The in vivo impact was investigated via nude mice models. The role of STRIP2 in CRC was investigated via transcriptomic analysis, Western blot, Co-immunoprecipitation assays and ferroptosis validations. STRIP2 is overexpressed in CRC, driving malignant phenotypes in vitro and in vivo. Mechanically, STRIP2 stabilizes the IL17 downstream effector LCN2 by blocking its K48-linked ubiquitination and degradation, enhances anti-ferroptosis of CRC cells. Oe-STRIP2 suppresses ferroptosis, boosting proliferation and reducing oxidative stress; while si-STRIP2 induces the opposite effect. This study suggests STRIP2-mediated stabilization of LCN2 and enhances CRC cells ferroptosis resistance, thus promoting CRC cell survival and mediates malignant progression in CRC, which provides a novel link between STRIP2 and ferroptosis regulation in CRC. HighlightO_LISTRIP2 is overexpressed in CRC tissues and cells C_LIO_LISTRIP2 blocks LCN2 Ubiquitination and stabilizes LCN2 C_LIO_LISTRIP2 suppresses CRC ferroptosis C_LIO_LISTRIP2 drives CRC malignant phenotypes both in vitro & in vivo C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/725308v1_ufig1.gif" ALT="Figure 1"> View larger version (52K): org.highwire.dtl.DTLVardef@1baf7baorg.highwire.dtl.DTLVardef@1de15d9org.highwire.dtl.DTLVardef@16c8078org.highwire.dtl.DTLVardef@667840_HPS_FORMAT_FIGEXP M_FIG C_FIG
Xiao, F.; Qin, F.; Luo, X.; Slewitzke, S. E.; Fernandes, G. F.; Johansson, M.; Xiao, X.; Zaridze, D.; Bojesen, S. E.; Shete, S.; Albanes, D.; Aldrich, M. C.; Tardon, A.; Fernandez-Tardon, G.; Le Marchand, L.; Rennert, G.; Bickeböeller, H.; Wichmann, H.-E.; Risch, A.; Muley, T.; Rosenberger, A.; Field, J. K.; Davies, M.; Woll, P.; Kiemeney, L. A.; Haugen, A.; Zienolddiny, S.; Lam, S.; Johansson, M.; Grankvist, K.; Schabath, M. B.; Andrew, A.; Lazarus, P.; Arnold, S. M.; Zhu, D.; Brenner, H.; Neuhouser, M. L.; Hung, R. J.; Christiani, D. C.; McKay, J.; Cai, G.; Xia, J.; Amos, C. I.
Show abstract
Background: Genome-wide association studies (GWAS) have identified numerous lung cancer susceptibility loci based on single nucleotide polymorphisms (SNPs), yet a substantial proportion of heritability remains unexplained. We therefore evaluated germline copy number variants (CNVs) as an underexplored source of genetic susceptibility and potential contributors to genomic instability in lung cancer. Methods: We conducted a genome-wide analysis of germline CNVs using 19,342 cases and 15,917 controls from the Transdisciplinary Research in Cancer of the Lung (TRICL) consortium, with replication in two independent cohorts. High-confidence CNVs were identified by integrating two CNV callers including PennCNV and modSaRa2. Association analyses were performed using both gene-based and CNV region-based approaches. Polygenic risk scores (PRS) were constructed from top loci, and functional validation was conducted using siRNA-mediated knockdown in lung fibroblast cells. Results: We identified CNVs in four genomic regions (1p36.22, 2q31.2, 6p21.32, and 19q13.32) significantly associated with lung cancer risk. Two loci (1p36.22 and 2q31.2) were consistently supported across both analytical strategies. A CNV-based PRS constructed from key genes (CLCN6, NFE2L2, OPA3, and PSMB8) was significantly associated with lung cancer risk and replicated across independent datasets. Functional assays demonstrated that knockdown of NFE2L2 and OPA3 increased endogenous DNA damage, supporting a role in genomic stability. Conclusions: Germline CNVs contribute to lung cancer susceptibility and may influence carcinogenesis through mechanisms related to genomic instability. Impact: These findings expand the genetic architecture of lung cancer and highlight CNVs as potential biomarkers for improving risk stratification and informing precision prevention strategies.
Kostareva, O. S.; Eliseeva, I. A.; Buyan, A. I.; Lyabin, D. N.; Tishchenko, S. V.; Mikhaylina, A. O.
Show abstract
Nucleobindin 1 (NUCB1) is a multifunctional conserved protein located in Golgi luminal, nucleus, extracellular and cytosolic pools. NUCB1 is multidomain protein comprised of a signal peptide, a DNA-binding domain, a leucine zipper and Ca2+ -binding domain. The multiple domains and localization of NUCB1 potentiates its interactions with various partners, such as DNA, Gi3 protein, cyclooxygenase 2, LRP10 and RNA suggests its importance in the regulation of many cellular events. We revealed that NUCB1 contains three RNA-binding regions and able to interact with two RNA fragments. It was suggested possible variants of the participation of NUCB1 in the interaction of the two partially complementary RNAs. The RNA-binding properties of the NUCB1 were also confirmed in vivo experiments.
Brown, S. M.; Hervey, J.; Dean, S. N.; Vora, G. J.
Show abstract
The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets. HighlightsO_LILifes amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations. C_LIO_LIAmino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals C_LIO_LIFindings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets. C_LI
Davis, W. J. H.; Thompson, M.; Farry, S. M.; McKinney, C.; Gimenez, G.; Hatley, M.; Kumar, R.; Rodger, E. J.; Chatterjee, A.; Diermeier, S. D.; Drummond, C. J.; Reid, G.
Show abstract
Lung adenocarcinomas frequently harbour actionable oncogenic mutations that are vulnerable to treatment with targeted therapies. While responses to targeted therapies are often initially dramatic, relapse is almost inevitable and prevents durable responses in advanced-stage patients. Relapse is, in part, caused by drug tolerant persister cells (DTPs) which are able to survive treatment by entering a reversible, dormant state. Although long non-coding RNAs (lncRNAs) regulate processes thought to allow DTPs to survive and become stably resistant, the potential roles of lncRNAs in DTPs are largely unknown. In this study, we sought to investigate the expression of lncRNAs in in vitro DTP models of lung adenocarcinoma. We found that the lncRNAs Metastasis-Associated Lung Adenocarcinoma Transcript 1 (MALAT1) and Nuclear Paraspeckle Assembly Transcript 1 (NEAT1) were enriched in DTPs and that knocking down MALAT1 enhanced the effect of targeted therapies in both EGFR- and KRAS-mutant DTP models. To better understand pathways that MALAT1 might regulate in DTPs, bulk RNA-sequencing was performed and several pathways that may contribute to the actions of MALAT1 in DTPs were identified. Overall, our work describes a role for the lncRNA MALAT1 in DTPs in NSCLC and suggests that MALAT1 may be a novel target for the prevention of drug tolerance and subsequent resistance to targeted therapy in NSCLC.
Meduri, R.; Satish, A. L.; Singh, U.
Show abstract
Selective deployment of multiple transcription start sites is a major regulatory feature of human transcriptomes. FANTOM CAGE data exhibit a near-universal TSS deployment parsimony which is disrupted in cancers. We have recently shown that TSS deployment is sensitive to gene function, futile upstream transcription, and cellular biosynthetic states. Patterns in FANTOM CAGE data can reveal mechanisms underlying TSS co-deployments. We propose and test the possibility that some TSSs act like epromoters and act as co-varying hubs of transcriptional activities for multiple other promoters. Using deep analysis of CAGE data implemented through neural networks we show that non-cancers implement transcription co-deployments through cores of epromoter-like TSSs which are generally proximal to their start codons. These TSSs show enhancer-like TFBSs profiles. A comparison with cancer CAGE data shows that the concentrated epromoter core is disrupted in cancers with multiple distal TSSs replacing the proximal TSS cores. We provide evidence that the core TSSs are rich in YY1 and CTCF binding sites and associated with genes coding for transcription factors. Our findings show that covariance of TSS deployment is sensitive to transcriptional resource cost and a parsimonic design of TSS co-deployments depends on proximal TSSs in non-cancers, a mechanism grossly disrupted in cancers. HighlightsO_LIHeterogeneous FANTOM CAGE data contains universal patterns of TSSs co-deployments. C_LIO_LITSS co-deployments exhibit a parsimonious "core-covariant" scheme which is disrupted in cancers. C_LIO_LICore TSSs are enriched in transcription factor binding sites and gene functions which justify biological features of the samples. C_LIO_LIThe DL pipeline we present identifies the core-covariant TSS sets in an unbiased manner. C_LI
Kant, S.; Masipeddi, S.; Bahadur, R. P.
Show abstract
Conformational plasticity of RNAs plays important roles in recognizing RNA-binding proteins, and is often modulated by their binding partners. Here, we investigate RNA conformational preferences in a non-redundant dataset of 263 protein-RNA complexes to characterize the structural landscape associated with protein recognition. RNA dinucleotide segments are analyzed using seven backbone torsion angles ({delta}1, {varepsilon}1, {zeta}1, 2, {beta}2, {gamma}2, and {delta}2), two glycosidic torsion angles ({chi}1 and {chi}2) and the pseudo-torsion angle . Focusing on dinucleotide steps present in both interface and non-interface regions, we performed density-based clustering using selected backbone torsion angles to identify recurrent conformational states. We identify 28 distinct RNA dinucleotide conformers containing at least ten members each. Among these, eight conformers represent previously unreported nucleotide conformers (NtCs), including the transitional and the non-canonical states AB06, AB07, BB21, BB22, OP32, OP33, IC08 and IC09. Several of these conformers are preferentially enriched at protein-binding interfaces, suggesting their involvement in local conformational adaptation during protein-RNA recognition. The newly identified conformers span transitional A-B geometries, distorted B-like states, open conformations and compact intercalated structures, highlighting the remarkable structural plasticity of RNA in ribonucleoprotein complexes. Overall, this study expands the current understanding of RNA conformational space and provides a refined RNA dinucleotide conformer library for protein-RNA complexes. These findings will facilitate the identification of novel RNA structural motifs and improved RNA structural modeling, docking protein-RNA complexes and deep learning-based prediction frameworks for describing RNA tertiary structures.
Sugrue, R. J.; Sutejo, R.; Tan, B. H.
Show abstract
We prepared siRNA libraries against the H5N2 virus NP gene, and the PA, PB1 and PB2 genes that express the proteins that form the virus polymerase complex. The antiviral activity of the siRNA libraries in H5N2 virus infected cells was initially assessed by using qPCR to measure the corresponding mRNA levels in the siRNA-treated cells. In this way siRNA molecules within each library were identified that exhibited to a greater than 70% reduction in levels of each target mRNA. A selection of these siRNA molecules was further evaluated for their antiviral activity in a multi-cycle H5N2 MDCK cell model. The siRNA molecules identified were successful in blocking virus transmission and lead to a reduction in influenza virus progeny virus production. This antiviral activity correlated with both the inhibition of nuclear export of the newly formed RNP complexs that arise from the transcriptional activity of the input virus, and the inhibition of the polymerase activity of the newly formed virus polymerase complexes. This study highlights the potential use of siRNA as a strategy to block virus transmission by targeting the avian influenza virus polymerase complex.
Woolston, D. W.; Churchill, M.; Grandori, C.; Advani, A.; Yeung, C. C. S.
Show abstract
PurposeGlasdegib is a Sonic Hedgehog (SHH) pathway inhibitor used for treating newly diagnosed acute myeloid leukemia in elders or patients unfit for intensive chemotherapy. This study sought to demonstrate growth inhibition and increased apoptosis of B-cell acute lymphoblastic leukemia (B-ALL) in vitro under glasdegib, alone and combined with inotuzumab, using a novel co-culture system and validated chemosensitivity testing model to determine whether glasdegib with and without inotuzumab may represent a promising treatment strategy in B-ALL. MethodsSeven blood and marrow samples from B-ALL patients were co-cultured with HS-5 stromal cells in a co-culturing system designed to mimic the tumor microenvironment to maintain B-ALL cell viability for chemosensitivity testing under glasdegib and inotuzumab. ResultsCo-culturing improved B-ALL viability from four to nine days. Dosage-dependent responses to glasdegib were consistent among B-ALL samples on day four based on culture viability, and varied based on expressions of SSH genes GLI1, GLI3, SMO, and PTCH1. Combination with inotuzumab had varied effects on treatment response. ConclusionCo-culturing B-ALL cells with HS-5 stromal cells improves B-ALL growth and viability. Glasdegib with and without inotuzumab treatments impact the viability of co-cultured B-ALL cells by day four. SHH gene expressions suggest different B-ALL patients may be sensitive or resistant to glasdegib and inotuzumab.
Szmigiel, A.; Gesteira Costa Filho, I.; Campello, R. J. G. B.
Show abstract
Clustering single-cell RNA-seq (scRNA-seq) data and related protocols remains a major challenge due to high dimensionality, sparsity, and noise. Despite numerous benchmarking studies aiming to identify the most suitable clustering methods, many suffer from methodological flaws that can undermine their conclusions. A major challenge in benchmarking is selecting representative datasets that cover the diversity of scRNA-seq experiments and include laboratory-verified labels for reliable evaluation. Consistent preprocessing of all inputs to benchmarked algorithms is crucial, as it significantly impacts performance. Beyond selecting an algorithm, a thorough exploration of hyperparameters is also essential to assess robustness and identify configurations that maximize performance. We focus on proposing an improved benchmarking framework that addresses common methodological issues in prior studies. We illustrate our proposed methodology in a case study comparing the classic Leiden and Louvain clustering algorithms with extensive hyperparameters exploration on a carefully curated collection of real gold standard datasets. By evaluating clustering performance across different hyper-parameter selection scenarios, we show that benchmarking results can be misleading, either overestimating or underestimating performance depending on how the hyperparameter space is explored. In our illustrative case study, benchmarking results do not reveal any practically relevant performance differences between the Louvain and Leiden algorithms. In contrast, we show that overlooked factors such as graph construction and quality functions critically influence clustering outcomes, particularly un-der suboptimal settings of numerical hyperparameters--the neighbor-hood size k used for similarity graph construction and the resolution hyperparameter in graph-based clustering algorithms. While noticeable trends have been observed in terms of how different (dis)similarity functions affect performance, the impact of this choice is limited and, to some extent, overridden by the graph-building approach. Across different graphs, there is a noticeable trade-off between achieving optimal performance with ideally tuned numerical hyperparameters and maintaining robustness under more realistic, unsupervised, and suboptimal settings. All in all, the analysis of our illustrative benchmarking case study offers clear guidance and objective recommendations for practitioners in the field. Most importantly, as the main contribution of this manuscript, our proposed framework sets a foundation for more reliable scRNA-seq clustering evaluation and benchmarking in future studies.
Sharma, M. K.; Chongtham, J.; Bhushan, A.; Chosdol, K.; Sinha, S.; Srivastava, T.
Show abstract
Glioblastoma (GBM) is the most aggressive primary brain malignancy, characterized by hypoxia-driven proliferation, therapeutic resistance, and poor prognosis. While hypoxia-induced transcriptional changes are well documented, the temporal regulation of cell cycle genes under sustained hypoxia remains unclear. This study profiled transcriptomic alterations in U87MG cells cultured under normoxia and graded hypoxia for one to three days. Differentially expressed genes (DEGs) were identified and analyzed using STRING, Cytoscape, MCODE, and CytoHubba to construct protein-protein interaction (PPI) networks and extract hub genes. Functional enrichment was assessed through DAVID, ClueGO, and KEGG, while prognostic relevance was evaluated using GlioVis and ONCOMINE datasets. qRT-PCR validated expression of selected hub genes. A total of 294 DEGs were identified, forming two main functional modules enriched in cell cycle regulation and chemokine signaling pathways. Eighteen hub genes (KIF20A, CCNB1, AURKA, EGR1, CDCA3, CENPF, CDCA2, ASPM, KIF11, CCL2, CCNA2, DLGAP5, RACGAP1, TPX2, PTGS2, CTGF, and KIFC1) were significantly associated with mitotic processes and GBM progression. Survival analysis demonstrated that 17 of these genes correlated with poor overall survival (p < 0.05). qRT-PCR confirmed that hub gene expression peaked during early hypoxia and declined with prolonged exposure, indicating dynamic regulatory adaptation. These findings identify key hypoxia-responsive genes governing cell cycle progression and highlight their prognostic and therapeutic potential in glioblastoma.
GAYRAUD, G.; Davila Felipe, M.; Padiolleau-Lefevre, S.; Maffucci, I.; Issouani, E. M.; Guerin, M.; Da Ponte, H.
Show abstract
Aptamers are single stranded DNA or RNA molecules selected for their high affinity and specificity to bind target molecules, similar to antibodies. They are commonly selected through the SELEX process, which involves the iterative exposure of a random sequence library to a target and retaining the sequences showing good binding properties. To improve Lyme disease detection, we propose designing aptamers that specifically bind to the CspZ protein on the surface of Borrelia burgdorferi, the bacterium responsible for the disease. Starting with a SELEX process consisting of thirteen rounds, from which selected in vitro sequence candidates have emerged, we aim to propose a holistic process that selects in silico new sequence candidates that are further validated experimentally. Our approach relies on 1) using Machine Learning (ML) techniques, specifically a Restricted Boltzmann Machine (RBM), to digitally replicate the last round of the SELEX process, 2) integrating insights from text analysis methods, such as word2vec and n-grams, into the RBM model trained on the final-round SELEX dataset to represent and compare newly generated sequences with in vitro candidates, 3) selecting in silico sequences with strong potential to bind to CspZ protein, 4) experimentally validating the selected in silico sequences of step 3. Our holistic approach combines biological insights with statistical models to improve the efficiency and outcome of the SELEX process. We enhance the RBM model, designed to replicate the distribution of the final SELEX round, by integrating geometric representations of sequences, which is especially advantageous when dealing with limited datasets relative to the vast sequence space. In addition, it provides in silico sequence candidates with strong binding properties.